Workload optimized server for intelligent algorithm trading platforms

ABSTRACT

Systems and methods for a workload optimized server for intelligent algorithm trading platforms. In an illustrative, non-limiting embodiment, an Information Handling System (IHS) may include a plurality of Central Processing Units (CPUs) and a control circuit coupled to the plurality of CPUs, the control circuit having a memory configured to store program instructions that, upon execution by the control logic, cause the IHS to: set a first number of enabled cores in a first CPU to operate with a first all-core turbo frequency, and set a second number of enabled cores in a second CPU to operate with a second all-core turbo frequency, where the first number of enabled cores is different from the second number of enabled cores, and where at least one of the first or second all core turbo frequencies is selected to cause the IHS to operate with reduced execution jitter.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of the filing date of U.S. patentapplication Ser. No. 14/483,597, which is titled “Workload OptimizedServer for Intelligent Algorithm Trading Platforms” and was filed Sep.11, 2014, the disclosure of which is hereby incorporated by referenceherein in its entirety. This application is a reissue of, and claims thebenefit of the filing date of, U.S. patent application Ser. No.15/456,457, filed on Mar. 10, 2017, now U.S. Pat. No. 10,198,296; whichin turn is a continuation of U.S. patent application Ser. No.14/483,597, titled “WORKLOAD OPTIMIZED SERVER FOR INTELLIGENT ALGORITHMTRADING PLATFORMS” and filed on Sep. 11, 2014, now U.S. Pat. No.9,619,289; the disclosures of which are hereby incorporated by referenceherein in their entireties.

FIELD

This disclosure relates generally to computer systems, and morespecifically, to systems and methods for a workload optimized server forintelligent algorithm trading platforms.

BACKGROUND

As the value and use of information continues to increase, individualsand businesses seek additional ways to process and store information.One option is an Information Handling System (IHS). An IHS generallyprocesses, compiles, stores, and/or communicates information or data forbusiness, personal, or other purposes. Because technology andinformation handling needs and requirements may vary between differentapplications, IHSs may also vary regarding what information is handled,how the information is handled, how much information is processed,stored, or communicated, and how quickly and efficiently the informationmay be processed, stored, or communicated. The variations in IHSs allowfor IHSs to be general or configured for a specific user or specific usesuch as financial transaction processing, airline reservations,enterprise data storage, global communications, etc. In addition, IHSsmay include a variety of hardware and software components that may beconfigured to process, store, and communicate information and mayinclude one or more computer systems, data storage systems, andnetworking systems.

An IHS may be designed with a multi-core processor. A multi-coreprocessor is a single computing component with two or more independentprocessor cores that are able to read and execute program instructionsor software code. These multiple cores can run multiple instructionsconcurrently, thus increasing the overall processing speed for programs.Multiple cores typically are integrated onto a single integrated circuitdie or integrated circuit or onto multiple dies in a single chippackage, generally referred to as the IHS's Central Processing Unit(CPU). In some cases, a single IHS may include two or more multi-coreprocessors. A multiprocessor IHS is a computing system fitted with twoor more CPUs, where each CPU may include two or more processing cores.

SUMMARY

Embodiments of systems and methods for a workload optimized server forintelligent algorithm trading platforms are described herein. In anillustrative, non-limiting embodiment, an Information Handling System(IHS) may include a plurality of Central Processing Units (CPUs); and acontrol circuit coupled to the plurality of CPUs, the control circuithaving a memory configured to store program instructions that, uponexecution by the control logic, cause the IHS to: set a first number ofenabled cores in a first CPU to operate with a first all-core turbofrequency and set a second number of enabled cores in a second CPU tooperate with a second all-core turbo frequency, where the first numberof enabled cores is different from the second number of enabled cores,and where at least one of the first or second all core turbo frequenciesis selected to cause the IHS to operate with reduced execution jitter.

For example, the control circuit may include basic input output (BIOS)logic, a first portion of an application may be run by the first CPU,and a second portion of the application may be run by the second CPU.The first portion of the application may include frequency sensitivethreads and the second portion of application may include parallelexecution sensitive threads. In some cases, the application may includea high frequency trading application, the first portion may include afeed handling or trading platform, and the second portion may include ananalytics platform.

The first number of enabled cores may be smaller than the second numberof enabled cores, and the first all-core turbo frequency may be greaterthan the second all-core turbo frequency. The program instructions, uponexecution by the control logic, may further cause the IHS to select thefirst and second all-core turbo frequencies using a turbo boostfrequency table. The first all-core turbo frequency is a highestfrequency available for the first number of enabled cores in the turboboost frequency table.

In some implementations, the program instructions, upon execution by thecontrol logic, further cause the IHS to change at least one of the firstnumber of enabled cores or the second number of enabled cores, andchange at least one of the first or second all-core turbo frequencies toreduce the execution jitter. Additionally or alternatively, the programinstructions may further cause the IHS to schedule all Advanced VectorExtensions (AVX) threads on the second CPU instead of the first CPU.

In another illustrative, non-limiting embodiment, a computer-implementedmethod may include receiving an indication of: (a) a first number ofenabled cores in a first processor of a multi-processor IHS chosen toexecute a first part of an application; (b) a second number of enabledcores in a second processor of the multi-processor IHS chosen to executea second part of the application; and (c) a type of instruction to beexecuted within the first or second parts of the application; andselecting a first all-core turbo frequency of the first processor and asecond all-core turbo frequency of the second processor to reduce anexecution jitter of the application during concurrent execution of thefirst and second portions, the selection based upon the first number ofcores, the second number of cores, and the type of instruction.

For example, the first and second number of enabled cores may beselected by a human user. The application may include a high frequencytrading application, the first portion may include a feed handlingand/or trading platform, and the second portion may include an analyticsplatform. The first number of enabled cores may be smaller than thesecond number of enabled cores, and the first all-core turbo frequencymay be greater than the second all-core turbo frequency.

In some implementations, selecting the first and second all-core turbofrequencies may include using a turbo boost frequency table. The firstall-core turbo frequency may be a highest frequency available for thefirst number of enabled cores in the turbo boost frequency table. Also,the type of instructions may include an AVX instruction.

In yet another illustrative, non-limiting embodiment, a non-transitorycomputer readable medium may have program instructions stored thereonthat, upon execution by an IHS, cause the IHS to receive an indicationof a type of instruction to be executed by a first CPU; and select afirst all-core turbo frequency of the first CPU and a second all-coreturbo frequency of a second CPU to reduce an execution jitter of anapplication including one or more instructions of the indicated type,where the first all-core turbo frequency is different from the secondall-core turbo frequency, and where at least one of the first or secondCPUs has at least one of its cores disabled. The program instructionsmay further cause the IHS to receive an indication of a first number ofcores in the first CPU chosen to execute a first part of an applicationand of a second number of cores in the second CPU to execute a secondpart of the application; and select the first and second all-core turbofrequencies based, at least in part, upon the first and second number ofcores.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention(s) is/are illustrated by way of example and is/arenot limited by the accompanying figures, in which like referencesindicate similar elements. Elements in the figures are illustrated forsimplicity and clarity, and have not necessarily been drawn to scale.

FIG. 1 illustrates an example IHS configured to implement varioussystems and methods described herein according to some embodiments.

FIG. 2 illustrates an example of core configuration parameters beingtransmitted from a basic input output system (BIOS) to a processorcontrol logic according to some embodiments.

FIG. 3 is a flowchart illustrating an example of a method by which coresare disabled and/or enabled for operation and jitter control in a singlemulticore processor according to some embodiments.

FIG. 4 is a flowchart illustrating an example of a method by which coresare disabled in a single multicore processor according to someembodiments.

FIG. 5 is a flowchart illustrating an example of a method by which anordered lookup table is generated according to some embodiments.

FIG. 6 is an example ordered lookup table identifying the sequence ofenabling the cores based on the total number of enabled cores in asingle multicore processor according to some embodiments.

FIG. 7 is an example of a two-processor IHS configured to reduceexecution jitter according to some embodiments.

FIG. 8 is an example of a turbo boost frequency table used to reduceexecution jitter of parallel threads in multiprocessor applicationsaccording to some embodiments.

FIG. 9 is a flowchart illustrating an example of a method for reducingexecution jitter of parallel threads in multiprocessor applicationsaccording to some embodiments.

FIG. 10 is a flowchart illustrating an example of a method for executingworkload optimized multiprocessor applications according to someembodiments.

DETAILED DESCRIPTION

Systems and methods for reducing execution jitter of parallel threads inmultiprocessor applications are described. Some software applicationsrequire very specific, precise, and/or deterministic code executiontiming. Examples of such applications include, but are not limited to,real time applications, financial trading applications (e.g., highvolume securities trading, automated securities trading, etc.), andcontrol applications. Generally speaking, it is desirable that theseapplications have predictable execution times. When such an applicationis executed by an IHS having two or more multicore CPUs, however,certain non-deterministic code execution timing problems often arise.

As used herein, the term “execution jitter” refers to a difference inexecution time for a given program or thread between the predictedexecution time and the actual execution time at a given clock frequency.For example, if a given thread is predicted to execute in 10milliseconds and some measured execution times are 8, 9, 11 and 13milliseconds, the execution jitter is the difference between themeasured times and 10 milliseconds. Execution jitter can occur forthreads that execute in either longer or shorter times than thepredicted or desired execution times. To address these, and otherconcerns, systems and methods described herein are provided that reduceexecution jitter in software programs running on two or more multicoreprocessors or CPUs. In various applications, these systems and methodsmay be used to provide a workload optimized server for intelligentalgorithm trading platforms or the like.

Illustrative embodiments provide an IHS, a multi-core processor, andmethods performed within the IHS that are suitable for: (1) reducingexecution jitter in multi-core processors; (2) enabling one or moreprocessor cores within a multi-core processor to operate at apre-determined maximum or turbo frequency; (3) enable two or moremulti-core processors to operate at distinct, pre-determined maximumturbo frequencies; and (4) provide consistent execution times forthreads running on multiple cores of multiple processors, for example,when one or more cores are disabled, that is, when one or more cores arepowered down or powered off, as opposed to simply non-active or unused.

In the following detailed description of exemplary embodiments of thedisclosure, specific exemplary embodiments in which the disclosure maybe practiced are described in sufficient detail a person of ordinaryskill in the art to practice the disclosed embodiments. For example,specific details such as specific method orders, structures, elements,and connections have been presented herein. However, it is to beunderstood that the specific details presented need not be utilized topractice embodiments of the present disclosure. It is also to beunderstood that other embodiments may be utilized and that logical,architectural, programmatic, mechanical, electrical, and other changesmay be made without departing from general scope of the disclosure. Thefollowing detailed description is, therefore, not to be taken in alimiting sense, and the scope of the present disclosure is defined bythe appended claims and equivalents thereof.

References within the specification to “one embodiment,” “anembodiment,” “embodiments”, or “one or more embodiments” are intended toindicate that a particular feature, structure, or characteristicdescribed in connection with the embodiment is included in at least oneembodiment of the present disclosure. The appearance of such phrases invarious places within the specification are not necessarily allreferring to the same embodiment, nor are separate or alternativeembodiments mutually exclusive of other embodiments. Further, variousfeatures are described which may be exhibited by some embodiments andnot by others. Similarly, various requirements are described which maybe requirements for some embodiments but not for other embodiments.

It is understood that the use of specific component, device and/orparameter names and/or corresponding acronyms thereof, such as those ofthe executing utility, logic, and/or firmware described herein, are forexample only and not meant to imply any limitations on the describedembodiments. The embodiments may thus be described with differentnomenclature and/or terminology utilized to describe the components,devices, parameters, methods and/or functions herein, withoutlimitation. References to any specific protocol or proprietary name indescribing one or more elements, features or concepts of the embodimentsare provided solely as examples of one implementation, and suchreferences do not limit the extension of the claimed embodiments toembodiments in which different element, feature, protocol, or conceptnames are utilized. Thus, each term utilized herein is to be given itsbroadest interpretation given the context in which that terms isutilized.

FIG. 1 illustrates a block diagram representation of an example IHS 100,within which one or more of the described features of the variousembodiments of the disclosure can be implemented. For purposes of thisdisclosure, an information handling system, such as IHS 100, may includeany instrumentality or aggregate of instrumentalities operable tocompute, classify, process, transmit, receive, retrieve, originate,switch, store, display, manifest, detect, record, reproduce, handle, orutilize any form of information, intelligence, or data for business,scientific, control, or other purposes. For example, an informationhandling system may be a handheld device, personal computer, a server, anetwork storage device, or any other suitable device and may vary insize, shape, performance, functionality, and price. The informationhandling system may include random access memory (RAM), one or moreprocessing resources such as a central processing unit (CPU) or hardwareor software control logic, ROM, and/or other types of nonvolatilememory. Additional components of the information handling system mayinclude one or more disk drives, one or more network ports forcommunicating with external devices as well as various input and output(I/O) devices, such as a keyboard, a mouse, and a video display. Theinformation handling system may also include one or more buses operableto transmit communications between the various hardware components.

Referring specifically to FIG. 1, example IHS 100 includes one or moreprocessor(s) 102 coupled to system memory 130 via system interconnect115. System interconnect 115 can be interchangeably referred to as asystem bus, in one or more embodiments. System memory 130 can includetherein a plurality of software and/or firmware modules includingfirmware (F/W) 132, basic input/output system (BIOS) 134, operatingsystem (O/S) 136, and application(s) 138. The one or more softwareand/or firmware modules within system memory 130 can be loaded intoprocessor(s) 102 during operation of IHS 100.

Processor(s) 102 include several processor cores, including core 0 104,core 1 106, core 2 108, core 3 110, core 4 112, core 5 114, core 6 116and core 7 118. Cores 104-118 can communicate with each other and withcontrol logic 120. Control logic 120 can control the operation of cores104-118. According to an aspect of the described embodiments, controllogic 120 may be configured to control the operating frequency andvoltage or operating state of cores 104-118. Control logic 120 can alsoreceive software and/or firmware modules from system memory 130 duringthe operation of processor(s) 102. In an embodiment, clock 121 isprovided on processor(s) 102 and enables the generation of severaldifferent periodic frequency signals that can be applied to one or moreof the cores 104-118 within one or more processor(s) 102.

IHS 100 further includes one or more input/output (I/O) controllers 140which support connection by, and processing of signals from, one or moreconnected input device(s) 142, such as a keyboard, mouse, touch screen,or microphone. I/O controllers 140 also support connection to andforwarding of output signals to one or more connected output devices144, such as a monitor or display device or audio speaker(s).Additionally, in one or more embodiments, one or more device interfaces146, such as an optical reader, a universal serial bus (USB), a cardreader, Personal Computer Memory Card International Association (PCMCIA)slot, and/or a high-definition multimedia interface (HDMI), can beassociated with IHS 100. Device interface(s) 146 can be utilized toenable data to be read from or stored to corresponding removable storagedevice(s) 148, such as a compact disk (CD), digital video disk (DVD),flash drive, or flash memory card. Device interfaces 146 can furtherinclude General Purpose I/O interfaces such as I.sup.2C, SMBus, andperipheral component interconnect (PCI) buses.

IHS 100 comprises a network interface device (NID) 150. NID 150 enablesIHS 100 to communicate and/or interface with other devices, services,and components that are located external to IHS 100. These devices,services, and components can interface with IHS 100 via an externalnetwork, such as example network 160, using one or more communicationprotocols. Network 160 can be a local area network, wide area network,personal area network, and the like, and the connection to and/orbetween network and IHS 100 can be wired or wireless or a combinationthereof. For purposes of discussion, network 160 is indicated as asingle collective component for simplicity. However, it is appreciatedthat network 160 can comprise one or more direct connections to otherdevices as well as a more complex set of interconnections as can existwithin a wide area network, such as the Internet.

A person of ordinary skill in the art will appreciate that the hardwarecomponents and basic configuration depicted in FIG. 1 and describedherein may vary. For example, the illustrative components within IHS 100are not intended to be exhaustive, but rather are representative tohighlight components that can be utilized to implement systems andmethods described herein. For example, other devices/components may beused in addition to or in place of the hardware depicted. The depictedexample does not convey or imply any architectural or other limitationswith respect to the presently described embodiments and/or the generaldisclosure.

With reference now to FIG. 2, there is illustrated an embodiment of coreconfiguration parameters 210 being transmitted from the basic inputoutput system (BIOS) 134 to processor control logic 120. In thediscussion of FIG. 2, reference is also made to components illustratedin FIG. 1. During the initial startup of IHS 100 and processor(s) 102,core configuration parameters 210 are transmitted from the BIOS 134 toprocessor control logic 120. Core configuration parameters 210 includeoperating states 212 for cores 104-118 for each of processor(s) 102.

According to one aspect of the disclosure, examples of operating states212 may include, for each distinct one of processor(s) 102: (a)identification of one or more cores selected to be enabled for operationat frequencies equal to or higher than the minimum clock frequency orcore operating frequency; and (b) an identification of specific coresselected to be disabled, such that the disabled cores are notoperational. Operating states 212 identify which of the one or more ofcores 104-118 are to be selected to be disabled and/or enabled andidentifies which of the one or more cores 104-118 are to be controlledfor execution jitter.

Operating frequencies that are higher than the “rated” core operatingfrequency are referred to as turbo states or frequencies. For example,if the rated core operating frequency is 2.0 GHz, operating states 212can be set or pre-determined such that one or more cores 104-118 operateat higher core frequencies such as 2.5 GHz, 3.0 GHz, 3.5 GHz, 4.0 GHz,or other frequencies. That is, the rated frequency is the frequency theCPU stock keeping unit (SKU) is actually rated for; whereas turbo statesare states that can be reached in an opportunistic manner to provide anoperating frequency that is higher than the rated frequency dependingupon operating conditions, which may vary over time.

Accordingly, as used herein, the “reference,” “base,” or “rated”operating frequency is, as its name suggest, the base CPU frequency. Insome situations, a CPU may run at lower frequency that its ratedfrequency, for example, for power management reasons. In order for theCPU to run at a higher frequency than its rated frequency, turbo statesmay be used. Moreover, it should be noted that turbo frequencies maychange as a function of the number of CPU cores that are enabled in agiven configuration. For sake of illustration only, assume that for agiven 12-core CPU, for example, the rated frequency is 2.7 GHz and theall-core turbo frequency when all 12 cores are enabled is 3.0 GHz. Inthis example, when 4 cores are disabled and the CPU operates with 8enabled cores (in a particular configuration), the rated frequency isstill 2.7 GHz; but the all-core enabled turbo frequency for the 8enabled cores may be 3.2 GHz.

The maximum actual core frequency for each core is subject to on-chiplimits in temperature, current, and power consumption. In one or moreembodiments, core configuration parameters 210 also include an orderedlookup table 214 of the cores, in which the cores are ordered by themaximum physical distance separating each core on the chip or die. Forexample, as shown in FIG. 1, core 0 104 is physically located furtheraway from core 7 118 than from core 4 112. Ordered lookup table 214 maybe used to select one or more cores 104-118 for operation.

Core configuration parameters 210 may be pre-determined by a user andstored in (BIOS) 134. For example, operating states 212 can direct fourof the cores 104-118 (e.g., core 0-core 3) to be disabled from operatingand another (i.e., different) four of the cores 104-118 (e.g., core4-core 7) to be enabled for operation and thus operate at a higher coreoperating frequency.

FIG. 3 illustrates a flowchart of exemplary methods by which cores are(a) disabled and enabled for operation and by which (b) cores arecontrolled for reducing execution jitter. Generally, method 300represents a computer-implemented method to reduce execution jitter in asingle multi-core processor, and to enable the processor's cores to beoperated at higher operating frequencies. In the discussion of FIG. 3,reference is also made to components illustrated in FIGS. 1 and 2.

According to some embodiments, disabled cores do not perform executionof instructions and do not generate heat, while enabled cores operate ata higher frequency that is variable depending upon processor workloadsand other factors that are internal to and based on the design of theprocessors. The foregoing distinction between enabled and disabled coresis presented in contrast with active versus non-active (or idle) cores.When a core is merely idle but is nonetheless enabled, it may not beused to execute threads but it still generates heat. Among enabledcores, jitter controlled cores are set to a pre-determined clockfrequency as can be specified by a user. And, different jittercontrolled cores can have different clock frequencies.

In some embodiments, when some of the cores in a given CPU package aredisabled, the CPU's architecture may allow the remaining ones of theenabled cores to run at “Max-All-Core-Turbo-Frequency.” This frequencyis typically higher when less-than-max number of cores are enabled. Thatis, the lesser the number of enabled cores, the higher theall-core-max-turbo frequency in such CPU architectures.

Method 300 begins at the start block and proceeds to block 302 at whichcontrol logic 120 determines if any of the cores 104-118 are to bedisabled from operation. Disabled cores are identified through the useof core configuration parameters 210 received from BIOS 134. Disabledcores do not operate and thus do not execute any instructions orgenerate heat. According to an embodiment, the rated or base operatingfrequency is the default or reference operating frequency for the cores.In response to none of the cores 104-118 being selected to be disabled,control logic 120 determines if any of cores 104-118 are to be jittercontrolled (block 308). In response to none of cores 104-118 beingselected to be jitter controlled, method 300 ends.

In response to one or more of cores 104-118 being selected to be jittercontrolled, however, the one or more cores selected for jitter controlare set by control logic 120 to operate at a maximum operating or turbofrequency that is dependent on the number of cores in operation (312).In some embodiments, control logic 120 sets the maximum operatingfrequency based upon pre-determined operating states 212. In otherembodiments, control logic 120 sets the maximum operating frequency ofthe jitter controlled cores to the reference frequency or minimum coreoperating frequency. Moreover, in various embodiments, the maximumoperating or turbo frequencies of each enabled core within a givenprocessor may be set to a same value, herein referred to as an “all-coreturbo frequency” for that processor. Method 300 then terminates at theend block.

In response to one or more of cores 104-118 being requested or selectedto be disabled in block 302, control logic 120 disables the selectedcores from operating at block 304 and determines the core operatingfrequency or turbo states for the enabled cores (306). Control logic 120determines if any of the enabled cores are to be jitter controlled(310). In response to none of the enabled cores being selected to bejitter controlled, method 300 ends. At block 314, in response to one ormore of the enabled cores being selected to be jitter controlled,control logic 120 sets or locks the cores selected for jitter control tooperate at a maximum operating frequency or turbo state previouslydetermined at block 306. Method 300 then terminates at the end block.

Method 300 allows a set of instructions or threads to execute acrossmultiple cores that provide both fast execution times and consistentexecution times (i.e., no jitter) within as single processor. With a setof one or more cores 104-118 (e.g., core 4-core 7) fixed to operate at apredetermined operating frequency, the predicted execution time and theactual execution time will be the same, resulting in no executionjitter. For example, if the highest clock frequency that a set ofinstructions or threads executing with consistent execution times (nojitter) on multiple cores is 3.5 GHz, method 300 can set or restrict twoor more of the cores 104-118 to operate at 3.5 GHz.

Turning now to FIG. 4, a flowchart of a method 400 by which cores areenabled in a single multicore processor is shown. In the discussion ofFIG. 4, reference is also made to components illustrated in FIGS. 1 and2. Method 400 begins at the start block and proceeds to block 402 wherelookup table 214 is loaded into control logic 120. Lookup table 214contains an ordered table of the cores 104-118 ordered by the maximumphysical distance or spacing on the chip or die (see, for example,maximum separation distance between core 0 104 and core 7 118 withinprocessor(s) 102 of FIG. 1). Control logic 120 enables a first one ofcores 104-118 (e.g., core 0 104) for operation in the order defined bylookup table 214 (block 404).

At block 406, control logic 120 determines if the requested or selectednumber of cores have been enabled for operation. In someimplementations, the number of cores selected to be enabled foroperation are determined by core configuration parameters 210 receivedfrom BIOS 134. In response to the selected number of cores beingenabled, method 400 ends. In response to the selected number of coresnot being enabled, method 400 returns to block 404 where control logic120 enables the next core for turbo state operation in the order definedby lookup table 214.

FIG. 5 illustrates a flowchart of a method 500 for generating an orderedlookup table 214. In the discussion of FIG. 5, reference is also made tocomponents illustrated in FIGS. 1 and 2. Method 500 begins at the startblock and proceeds to block 502 where one of cores 104-118 is selectedby control logic 120. The selected core is placed into the lookup table214 (block 504). At decision block 506, control logic 120 determines ifall of the required cores 104-118 have been placed into the lookup table214. In response to all of the cores being placed into the lookup table214, method 500 terminates. In response to there being other coresremaining to be placed into the lookup table 214, control logic 120selects (at block 508) the next core with maximum physical spacingdistance from the previously selected core(s) in the processor (e.g.,processor(s) 102, FIG. 1), and control logic 120 places the selectednext core into the lookup table 214 (block 504). In some cases, the nextcore with maximum physical spacing distance is selected from among theother non-selected cores (i.e., cores that are not yet placed in thelookup table).

Referring to FIG. 6, an embodiment of an ordered lookup table 214presenting an increasing number of enabled cores and the correspondingenabled cores (at maximum physical spacing distance) generated by method500 of FIG. 5 is shown. Ordered lookup table 214 is based on the coreswith maximum physical spacing from each other. In the discussion of FIG.6, reference is also made to components illustrated in FIG. 5 1 and FIG.2. Lookup table 214 includes a first column, number of enabled cores602, indicating the different number of cores than can be enabled, and asecond column, enabled core(s) 604, identifying the specific core(s)that is enabled as the number of cores that are enabled increases.

FIG. 7 is an example of two-processor IHS 700 configured to reduceexecution jitter. In the discussion of FIG. 5, reference is also made tocomponents illustrated in FIGS. 1 and 2. Particularly, CPU 701 and CPU702 may be part of processor(s) 102, memory 703 and memory 704 may bepart of system memory 130 that are otherwise dedicated to CPUs 701 and702, respectively, and NID 150 is also shown. In some cases, CPU 701 andCPU 702 may be the same part number, and therefore CPU 701 may have thesame number of cores as CPU 702. In other cases, CPUs 701 and 702 may bedifferent parts with different numbers of cores.

In some implementations, CPUs 701 and 702 may be configured to execute ahigh frequency trading (HFT) application or the like. High-frequencytrading (HFT) is the automated, rapid trading of securities. HFT usesspecialized trading algorithms to move in and out of positions inseconds (or fractions of a second), moving in and out of short-termpositions and aiming to capture small profits on every trade. IHSsdesigned for HFT can sometimes handle round-trip order execution speeds(from hitting “transmit order” to receiving an acknowledgment) in theorder of low single digit to low double digit microseconds, thereforerequiring very low latencies or high operating frequencies, with reducedexecution jitter.

For example, a first portion of the application may include a feedhandling, which receives data from a securities exchange. A secondportion of the application may include an analytics platform configuredto analyze the incoming data and make trading decisions. A third portionof the application may include a trading platform that sends tradingorders (e.g., buy, sell, etc.) to the exchange.

In some cases, the feed handling and/or trading platform may be executedby CPU 701, and HFT data may be sent or received via NID 150. Theanalytics platform may in turn be executed by CPU 702. These variousportions may be run concurrently, with parallel threads. Moreover, itmay be desirable that CPU 702 use a higher number of cores than CPU 701(A<B). For example, a given number of cores may be disabled in CPU 701and a different number of cores may be disabled in CPU 702.Alternatively, all cores may be enabled in CPU 702.

Additionally or alternatively, it may be desirable that that CPU 701have a higher all-core maximum or turbo frequency than CPU 702 (X>Y). Byenabling (e.g., via the BIOS) a fewer number of cores in CPU 701, ahigher turbo frequency may be reached for each enabled core in that CPU.Also, in some cases, the clock frequencies of CPUs 701 and 702 may beselected based upon a type of instruction expected to be executed by theCPU (e.g., Advanced Vector Extensions (AVX) instructions, or any otherhigh power instructions now existing or yet to be developed, versusnon-AVX or default instructions). These frequencies may be tuned using aturbo boost frequency table or the like and the reduced execution jittermethods described previously.

FIG. 8 is an example of turbo boost frequency table 800 used to reduceexecution jitter of parallel threads in multiprocessor applicationsaccording to some embodiments. In the discussion of FIG. 8, reference isalso made to components illustrated in FIG. 7. As shown, the frequenciesin table 800 account for a number of cores being used (e.g., 4, 8, 12,or 16) as well as the type of instruction executed by those cores (e.g.,AVX or non-AVX). In some implementations, every enabled core in a givenone of CPUs 701 or 702 operates at the same, execution jitter-freeclock. Furthermore, one or more cores in a given one of CPUs 701 or 702may be disabled.

For example, both CPUs 701 and 702 may have 16 available cores. If CPU701 is selected to execute non-AVX code with 4 enabled cores (that is,12 of the 16 cores are disabled or turned off), the all-core maximumturbo frequency for each of the 4 enabled cores may be set to 2.6 GHz.Conversely, if CPU 702 is selected to execute AVX code with 16 enabledcores (no cores are disabled), its all-core maximum turbo frequency maybe set to 2 GHz. In combination with each other, the selected clockfrequencies for CPUs 701 and 702 may provide a reduced or minimizedexecution jitter for high-end, low latency applications such as HFTplatforms or the like.

FIG. 9 is a flowchart illustrating an example of method 900 for reducingexecution jitter of parallel threads in multiprocessor applications. Insome embodiments, method 900 may be performed, at least in part, by BIOS134 of FIG. 1. Method 900 begins at the start block and proceeds toblock 901, where a user may select one of a plurality of CPUs (e.g., CPU701 of FIG. 7). At block 902, the user may select a number of cores tobe used by the selected CPU. At block 903, the user may select the typeof instruction to be executed by those cores. Additionally, oralternatively, the type of instruction may be identified at runtime.Then, at block 904, method 900 includes setting an operating frequencyof all cores in the selected CPU that provides jitter-free or reducedjitter operation, given the number of cores and/or instruction type, forexample, using a turbo boost frequency table such as table 800 in FIG.8. At block 905, if additional CPUs are being set up, control returns toblock 901 where another CPU may be selected. Otherwise, method 900 ends.

Accordingly, in some embodiments, each CPU socket may be tuned to itsspecific workload threads, which enables tuning (optimized tradeoffs)for high single threaded performance for 1 socket while allowing tuningfor highly parallel threads on a second socket. These systems andmethods also provide the ability to customize a number of coresfrequency per CPU in a jitter-free manner. As such, they enable acustomer to optimize number of cores to get higher performance (higherspeed of execution using higher all-core turbo frequencies) versusparallel performance (volume of instructions) for selected socket, andalso enable the use of mixed CPUs in the IHS's configuration.Additionally, or alternatively, these systems and methods may allowcustomers to optimize their total system performance by schedulingAVX-enabled threads on one socket while running non-AVX threads onanother socket, thereby resulting in jitter free operation on ALLthreads on both sockets without sacrificing performance for non-AVXthreads.

FIG. 10 is a flowchart illustrating an example of a method for executingworkload optimized multiprocessor applications according to someembodiments. In the discussion of FIG. 10, reference is also made tocomponents illustrated in FIG. 7. Method 1000 begins at the start blockand proceeds to block 1001, where a user may select an application to berun by processors or CPUs 701 and 702. As described in FIG. 7, such anapplication may have a number of portions, each portion having threadsthat are then scheduled to be executed by a given one of CPUs 701 or702. In some cases, the first portion of the application may includefrequency sensitive threads whereas the second portion of applicationmay include parallel execution sensitive threads. For example, theapplication may include a high frequency trading application, a firstportion of that application may include a feed handling or tradingplatform, and a second portion of that application may include ananalytics platform.

At block 1002, method 1000 may determine whether a particular thread isclassified as frequency sensitive (that is, a thread for which speed ofexecution is important or critical). If so, then the given thread isassigned to first CPU 701 at block 1003 (fewer enabled cores, higherall-core turbo frequency). Otherwise block 1004 determines whether thethread is classified as a parallel execution sensitive (that is, one forwhich a number of other threads need to be executed in parallel,concurrently, or simultaneously, for example, to collaborate orcooperate with each other). If so, then the given thread is assigned tosecond CPU 702 (more enabled cores; lower all-core turbo frequency). Ifnot, then control passes to block 1006 where a second thread isselected, if it exists, and the process is repeated for each subsequentthread. Otherwise, method 1000 ends. In some cases, when a thread hasnot been classified as either frequency or parallel execution sensitive,conventional methods for assigning that thread to a given CPU (e.g.,load balancing, etc.) may be used.

In the above described flowcharts, one or more of the methods may beembodied in a computer readable medium containing computer readable codesuch that a series of functional processes are performed when thecomputer readable code is executed on a computing device. In someimplementations, certain steps of the methods are combined, performedsimultaneously or in a different order, or perhaps omitted, withoutdeviating from the scope of the disclosure. Thus, while the methodblocks are described and illustrated in a particular sequence, use of aspecific sequence of functional processes represented by the blocks isnot meant to imply any limitations on the disclosure. Changes may bemade with regards to the sequence of processes without departing fromthe scope of the present disclosure. Use of a particular sequence istherefore, not to be taken in a limiting sense, and the scope of thepresent disclosure is defined only by the appended claims.

Aspects of the present disclosure are described above with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of thedisclosure. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. Computer program code for carrying outoperations for aspects of the present disclosure may be written in anycombination of one or more programming languages, including anobject-oriented programming language, without limitation. These computerprogram instructions may be provided to a processor of a general-purposecomputer, special purpose computer, such as a service processor, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, performs the method forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

As will be further appreciated, the processes in embodiments of thepresent disclosure may be implemented using any combination of software,firmware or hardware. Accordingly, aspects of the present disclosure maytake the form of an entirely hardware embodiment or an embodimentcombining software (including firmware, resident software, micro-code,etc.) and hardware aspects that may all generally be referred to hereinas a “circuit,” “module,” or “system.” Furthermore, aspects of thepresent disclosure may take the form of a computer program productembodied in one or more computer readable storage device(s) havingcomputer readable program code embodied thereon. Any combination of oneor more computer readable storage device(s) may be utilized. Thecomputer readable storage device may be, for example, but not limitedto, an electronic, magnetic, optical, electromagnetic, infrared, orsemiconductor system, apparatus, or device, or any suitable combinationof the foregoing. More specific examples (a non-exhaustive list) of thecomputer readable storage device would include the following: anelectrical connection having one or more wires, a portable computerdiskette, a hard disk, a random access memory (RAM), a read-only memory(ROM), an erasable programmable read-only memory (EPROM or Flashmemory), an optical fiber, a portable compact disc read-only memory(CD-ROM), an optical storage device, a magnetic storage device, or anysuitable combination of the foregoing. In the context of this document,a computer readable storage device may be any tangible medium that cancontain, or store a program for use by or in connection with aninstruction execution system, apparatus, or device.

While the disclosure has been described with reference to exemplaryembodiments, it will be understood by those skilled in the art thatvarious changes may be made and equivalents may be substituted forelements thereof without departing from the scope of the disclosure. Inaddition, many modifications may be made to adapt a particular system,device, or component thereof to the teachings of the disclosure withoutdeparting from the essential scope thereof. Therefore, it is intendedthat the disclosure not be limited to the particular embodimentsdisclosed for carrying out this disclosure, but that the disclosure willinclude all embodiments falling within the scope of the appended claims.Moreover, the use of the terms first, second, etc. do not denote anyorder or importance, but rather the terms first, second, etc. are usedto distinguish one element from another.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the disclosure.As used herein, the singular forms “a”, “an” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of oneor more other features, integers, steps, operations, elements,components, and/or groups thereof.

The description of the present disclosure has been presented forpurposes of illustration and description, but is not intended to beexhaustive or limited to the disclosure in the form disclosed. Manymodifications and variations will be apparent to those of ordinary skillin the art without departing from the scope of the disclosure. Thedescribed embodiments were chosen and described in order to best explainthe principles of the disclosure and the practical application, and toenable a person of ordinary skill in the art to understand thedisclosure for various embodiments with various modifications as aresuited to the particular use contemplated.

The invention claimed is:
 1. An Information Handling System (IHS),comprising: a plurality of Central Processing Units (CPUs); and a memoryhaving program instructions that, upon execution, cause the IHS to: seta first number of cores in a first CPU to operate with a firstfrequency; and set a second number of cores in a second CPU to operatewith a second frequency, wherein the first number of cores is differentfrom the second number of cores, and wherein at least one of the firstor second frequencies is selected to cause the IHS to operate withreduced execution jitter.
 2. The IHS of claim 1, wherein the memory ispart of a Basic Input/Output System (BIOS), wherein a first portion ofan application is run by the first CPU, and wherein a second portion ofthe application is run by the second CPU.
 3. The IHS of claim 2, whereinthe first portion of the application includes frequency sensitivethreads and wherein the second portion of application includes parallelexecution sensitive threads.
 4. The IHS of claim 3, wherein theapplication includes a high frequency trading application, wherein thefirst portion includes a feed handling and/or trading platform, andwherein the second portion includes an analytics platform.
 5. The IHS ofclaim 1, wherein the first number of cores is smaller than the secondnumber of cores, and wherein the first frequency is greater than thesecond frequency.
 6. The IHS of claim 1, wherein the programinstructions, upon execution by the control logic, further cause the IHSto select the first and second frequencies using a table.
 7. The IHS ofclaim 6, wherein the first frequency is a highest frequency availablefor the first number of cores in the table.
 8. The IHS of claim 1,wherein the program instructions, upon execution by the control logic,further cause the IHS to: change at least one of the first number ofcores or the second number of cores; and change at least one of thefirst or second frequencies to reduce the execution jitter.
 9. The IHSof claim 1, wherein the program instructions, upon execution by thecontrol logic, further cause the IHS to schedule all Advanced VectorExtensions (AVX) threads on the second CPU instead of the first CPU. 10.A computer-implemented method, comprising: receiving an indication of:(a) a first number of cores in a first processor of a multi-processorInformation Handling System (IHS) chosen to execute a first part of anapplication; (b) a second number of cores in a second processor of themulti-processor IHS chosen to execute a second part of the application;and (c) a type of instruction to be executed within the first or secondparts of the application; and selecting a first frequency of the firstprocessor and a second frequency of the second processor to reduce anexecution jitter of the application during concurrent execution of thefirst and second portions, the selection based upon the first number ofcores, the second number of cores, and the type of instruction, whereinthe first number of cores is different from the second number of cores.11. The computer-implemented method of claim 10, wherein the first andsecond number of cores are selected by a human user.
 12. Thecomputer-implemented method of claim 10, wherein the applicationincludes a high frequency trading application, wherein the first portionincludes a feed handling and/or trading platform, and wherein the secondportion includes an analytics platform.
 13. The computer-implementedmethod of claim 10, wherein the first number of cores is smaller thanthe second number of cores, and wherein the first frequency is greaterthan the second frequency.
 14. The computer-implemented method of claim10, wherein selecting the first and second all-core turbo frequenciesincludes using a table.
 15. The computer-implemented method of claim 10,wherein the first all-core turbo frequency is a highest frequencyavailable for the first number of cores in the table.
 16. Thecomputer-implemented method of claim 10, wherein the type ofinstructions includes an Advanced Vector Extensions (AVX) instruction.17. A hardware memory storage device having program instructions storedthereon that, upon execution by an Information Handling System (IHS),cause the IHS to: receive an indication of a type of instruction to beexecuted by a first CPU; and select a first frequency of the first CPUand a second frequency of a second CPU to reduce an execution jitter ofan application including one or more instructions of the indicated type,wherein the first CPU comprises a first number of cores, the second CPUcomprises a second number of cores, and the first number of cores isdifferent from the second number of cores.
 18. The hardware memorystorage device of claim 17, wherein the type of instructions includes anAdvanced Vector Extensions (AVX) instruction.
 19. The hardware memorystorage device of claim 17, wherein the program instructions, uponexecution by the IHS, further cause the IHS to: receive an indication ofa first number of cores in the first CPU being chosen to execute a firstpart of an application and of a second number of cores in the second CPUbeing chosen to execute a second part of the application; and select thefirst and second frequencies based, at least in part, upon the first andsecond number of cores.
 20. The hardware memory storage device of claim19, wherein the application includes a high frequency tradingapplication, wherein the first portion includes a feed handling and/ortrading platform, and wherein the second portion includes an analyticsplatform.